Name | Version | Summary | date |
document-data-extractor |
1.0.4 |
Best open-source document to markdown extractor for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract |
2025-07-29 08:25:56 |
llm-data-converter |
2.2.0 |
Best open-source document to markdown converter for LLM training data. Convert PDF, Word, PowerPoint, Excel, images, URLs to clean markdown, JSON, HTML locally. Alternative to Unstructured, Docling, Marker, MarkItDown, MinerU, PaddleOCR, Tesseract |
2025-07-25 13:32:07 |
tikara |
0.1.5 |
The metadata and text content extractor for almost every file type. |
2025-01-26 23:33:40 |
litepali |
0.0.5 |
Lightweight ColPali-based retrieval for cloud |
2024-09-15 18:02:09 |